Similarity Searching in Text Databases with Multiple Field Types

نویسندگان

  • Kostas Tzeras
  • Euripides G. M. Petrakis
چکیده

Similarity searching in text databases with multiple field types is still an open problem. We experimented with CORDIS and we evaluated the effectiveness of many text retrieval methods in terms of precision, recall and ranking quality.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Similarity searching in the CORDIS text database

Similarity searching in text databases with multiple field types is still an open problem. We focus our attention on the “COmmunity Research and Development Information Service” (CORDIS) database of the European Union and we evaluate the effectiveness of many text retrieval methods in terms of precision, recall and ranking quality. Our experiments indicate that different field types should be h...

متن کامل

Chemical Similarity Searching

This paper reviews the use of similarity searching in chemical databases. It begins by introducing the concept of similarity searching, differentiating it from the more common substructure searching, and then discusses the current generation of fragment-based measures that are used for searching chemical structure databases. The next sections focus upon two of the principal characteristics of a...

متن کامل

The Protein Information Resource (PIR)

The Protein Information Resource (PIR) produces the largest, most comprehensive, annotated protein sequence database in the public domain, the PIR-International Protein Sequence Database, in collaboration with the Munich Information Center for Protein Sequences (MIPS) and the Japan International Protein Sequence Database (JIPID). The expanded PIR WWW site allows sequence similarity and text sea...

متن کامل

Flexible String Matching Against Large Databases in Practice

Data Cleaning is an important process that has been at the center of research interest in recent years. Poor data quality is the result of a variety of reasons, including data entry errors and multiple conventions for recording database fields, and has a significant impact on a variety of business issues. Hence, there is a pressing need for technologies that enable flexible (fuzzy) matching of ...

متن کامل

Identification of BKCa channel openers by molecular field alignment and patent data-driven analysis

In this work, we present the first comprehensive molecular field analysis of patent structures on how the chemical structure of drugs impacts the biological binding. This task was formulated as searching for drug structures to reveal shared effects of substitutions across a common scaffold and the chemical features that may be responsible. We used the SureChEMBL patent database, which prov...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999